AITopics | weakly supervised

Collaborating Authors

weakly supervised

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c3535febaff29fcb7c0d20cbe94391c7-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 04:44:59 GMT

attention map, detection, proceedings, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 01:21:26 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Paper describes a method to identify image patches that are (a) diagnostic of particular objects (b) not particularly redundant and (c) cover well the collection of diagnostic patches. The method applies to the weakly supervised case, where images are known to contain the object(s) of interest, but the location of these objects is not known. This is a very well studied topic. Once these patches have been identified, related pairs are found by a mining process.

algorithm, configuration, dataset, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

b54e0146a82945f01e69c2e3309ba925-Paper-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 01:03:39 GMT

artificial intelligence, machine learning, natural language, (12 more...)

Neural Information Processing Systems

Country:

Europe > France (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Ukraine (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection Supplementary Material

Neural Information Processing SystemsAug-16-2025, 06:56:55 GMT

Bottom: CASD overlaid with attentions. Recall that WSOD conducts classification on object proposals (e.g., bounding boxes generated by Selective Search [ Figure 1 shows both the success and the failure cases of CASD. This could be improved by hard-sample mining in CASD training. This localization advantages of CASD benefit from its learning of comprehensive attention (see the bottom row of Figure 1). CorLoc only evaluates the localization accuracy of detectors.

casd, comprehensive attention self-distillation, detection, (8 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (0.75)

Add feedback

Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension

Neural Information Processing SystemsMay-27-2025, 12:03:29 GMT

This paper explores the weakly-supervised referring image segmentation (WRIS) problem, and focuses on a challenging setup where target localization is learned directly from image-text pairs. We note that the input text description typically already contains detailed information on how to localize the target object, and we also observe that humans often follow a step-by-step comprehension process (\ie, progressively utilizing target-related attributes and relations as cues) to identify the target object. Hence, we propose a novel Progressive Comprehension Network (PCNet) to leverage target-related textual cues from the input description for progressively localizing the target object.Specifically, we first use a Large Language Model (LLM) to decompose the input text description into short phrases. These short phrases are taken as target-related cues and fed into a Conditional Referring Module (CRM) in multiple stages, to allow updating the referring text embedding and enhance the response map for target localization in a multi-stage manner.Based on the CRM, we then propose a Region-aware Shrinking (RaS) loss to constrain the visual localization to be conducted progressively in a coarse-to-fine manner across different stages.Finally, we introduce an Instance-aware Disambiguation (IaD) loss to suppress instance localization ambiguity by differentiating overlapping response maps generated by different referring texts on the same image. Extensive experiments show that our method outperforms SOTA methods on three common benchmarks.

image segmentation, progressive comprehension, weakly supervised, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.64)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

TeD-Loc: Text Distillation for Weakly Supervised Object Localization

Murtaza, Shakeeb, Belharbi, Soufiane, Pedersoli, Marco, Granger, Eric

arXiv.org Artificial IntelligenceJan-21-2025

Weakly supervised object localization (WSOL) using classification models trained with only image-class labels remains an important challenge in computer vision. Given their reliance on classification objectives, traditional WSOL methods like class activation mapping focus on the most discriminative object parts, often missing the full spatial extent. In contrast, recent WSOL methods based on vision-language models like CLIP require ground truth classes or external classifiers to produce a localization map, limiting their deployment in downstream tasks. Moreover, methods like GenPromp attempt to address these issues but introduce considerable complexity due to their reliance on conditional denoising processes and intricate prompt learning. This paper introduces Text Distillation for Localization (TeD-Loc), an approach that directly distills knowledge from CLIP text embeddings into the model backbone and produces patch-level localization. Multiple instance learning of these image patches allows for accurate localization and classification using one model without requiring external classifiers. Such integration of textual and visual modalities addresses the longstanding challenge of achieving accurate localization and classification concurrently, as WSOL methods in the literature typically converge at different epochs. Extensive experiments show that leveraging text embeddings and localization cues provides a cost-effective WSOL model. TeD-Loc improves Top-1 LOC accuracy over state-of-the-art models by about 5% on both CUB and ILSVRC datasets, while significantly reducing computational complexity compared to GenPromp.

localization, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.12632

Country:

North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition

Belharbi, Soufiane, Pedersoli, Marco, Koerich, Alessandro Lameiras, Bacon, Simon, Granger, Eric

arXiv.org Artificial IntelligenceOct-1-2024

Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (AUs) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models. During training, this AU codebook is used, along with the input image expression label, and facial landmarks, to construct a AU heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with AU heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with AU maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks RAF-DB, and AffectNet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.

expression recognition, facial expression recognition, localization, (9 more...)

arXiv.org Artificial Intelligence

2410.01848

Country: North America > Canada > Quebec > Montreal (0.07)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.96)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Weakly Supervised Pretraining and Multi-Annotator Supervised Finetuning for Facial Wrinkle Detection

Moon, Ik Jun, Moon, Junho, Jang, Ikbeom

arXiv.org Artificial IntelligenceAug-19-2024

Analyzing extensive collections of images can be exceedingly resource-intensive if each facial wrinkle must be individually assessed. Moreover, the subjectivity inherent in manual segmentation processes can diminish the reliability of research findings and pose a substantial issue. To address this issue, we effectively combine wrinkle data labeled by multiple annotators to minimize inter-rater variability and utilize these image-label pairs for training our model.

face image, texture mask, wrinkle, (12 more...)

arXiv.org Artificial Intelligence

2408.09952

Country: